Image processing based modeling for Rosa roxburghii fruits mass and volume estimation

The mass and volume of Rosa roxburghii fruits are essential for fruit grading and consumer selection. Physical characteristics such as dimension, projected area, mass, and volume are interrelated. Image-based mass and volume estimation facilitates the automation of fruit grading, which can replace time-consuming and laborious manual grading. In this study, image processing techniques were used to extract fruit dimensions and projected areas, and univariate (linear, quadratic, exponential, and power) and multivariate regression models were used to estimate the mass and volume of Rosa roxburghii fruits. The results showed that the quadratic model based on the criterion projected area (CPA) estimated the best mass (R2 = 0.981) with an accuracy of 99.27%, and the equation is M = 0.280 + 0.940CPA + 0.071CPA2. The multivariate regression model based on three projected areas (PA1, PA2, and PA3) estimated the best volume (R2 = 0.898) with an accuracy of 98.24%, and the equation is V = − 8.467 + 0.657PA1 + 1.294PA2 + 0.628PA3. In practical applications, cost savings can be realized by having only one camera position. Therefore, when the required accuracy is low, estimating mass and volume simultaneously from only the dimensional information of the side view or the projected area information of the top view is recommended.

physical characteristics of fruits by computer vision has been very high.It has been widely used in grading operations [14][15][16][17] .Computers replace human labor, automatically extract physical characteristics, make the fastest mass and volume estimations, control machinery for grading operations, and minimize errors 18,19 .Image processing technology is widely applied in various fields, such as robot navigation, aircraft navigation, medical scanning, industrial measurement, and agricultural harvesting and processing [20][21][22][23] .Researchers have given greater attention to the method of processing images by computer to build a mathematical model to determine the fruit's morphology 24 .For example, Khoshnam et al. used image processing software that can directly obtain the diameters and projected areas of pomegranates in the three vertical directions.Based on the projected areas, they obtained a mass estimation model with a coefficient of determination (R 2 ) up to 0.96 25 .Mansouri et al. obtained a mathematical model for optimal melon seed mass estimation by obtaining three-dimensional information (length, width, and height) of melon seeds from images 26 .However, when obtaining dimension information, it is often necessary to place the apparent dimension position (maximum or minimum diameter) of the fruit in the horizontal or vertical direction of the image, and the actual placement cannot be uniform.
Therefore, this paper presents a method for determining the dimensions of near-ellipsoidal fruits from images.The main research objective is to find the optimal estimation model for the mass and volume of Rosa roxburghii fruits based on the dimensional and projected area information of 2D images.

Material acquisition
Newly harvested Rosa roxburghii fruits were picked and purchased from an orchard in Longli County, Guizhou Province, China.The experiments were performed after the harvested Rosa roxburghii fruits were carefully delivered to the laboratory and refrigerated at 8 ℃ for 24 h.Physical measurements were made on 60 random Rosa roxburghii fruits with no obvious surface defects or damage.
Figure 1 shows the process of image-based estimation of Rosa roxburghii mass and volume.Before the image acquisition step, the volume and mass of the Rosa roxburghii samples need to be determined manually.Then, the ratio of pixel values per unit area needs to be obtained, and then pictures are taken along the three coordinate axes of the Rosa roxburghii in the figure.The mass (M) of each Rosa roxburghii fruit was measured on a digital balance with an accuracy of 0.01 g.The water displacement method measured the final volume (V), keeping the water temperature at 25 ℃ 27 .Before selecting these 60 Rosa roxburghii fruits, we made a preliminary classification into three categories: large, medium and small.The parameter values of the 20 fruits selected for each category are shown in Table 1.The image processing and segmentation step is to reduce the noise interference of the picture, and the Rosa roxburghii picture is de-pricked so that the shape of the Rosa roxburghii fruit can be effectively segmented from the background 28 .With the above processing, dimensional and projected area measurements were determined to obtain an estimation model for volume and mass.

Image acquisition
Image acquisition was carried out on a horizontal experimental bench (Fig. 2), and the images of the Rosa roxburghii were obtained by a Canon G1 X camera whose basic parameters were set to a focal length of f = 35 mm and a sensor size of 18.7 × 14.0 mm.The dimensions extracted from the image are measured in pixels.Therefore, obtaining the number of pixel points per unit length and per unit area is necessary, which are converted to actual dimensions and areas utilizing pre-determined constant value ratios.For the camera used in this work, any object captured from a constant distance of H = 500 mm was measured to have 0.016 pixels per 1 mm 2 and 0.126 pixels per 1 mm.

Image preprocessing
Image preprocessing is necessary to improve image quality, mainly to reduce the impact of impurities and illumination in complex environments and to prepare for the subsequent segmentation of the complete Rosa roxburghii fruit contour.In this paper, the processing method of Javidan et al. is used in the preprocessing of Rosa roxburghii images, and also the method is effective in filling the voids produced by the images 29,30 .Due to the special external characteristics of the Rosa roxburghii fruits, and the ultimate goal of this paper, obtaining binary images after removing the prickly, we have made some changes in the image processing steps to apply to our study.The original image (Fig. 3a) produces shadows due to lighting effects, which can seriously affect the segmentation effect, resulting in shadows being segmented along with the target object.In Fig. 3b, we have adjusted the saturation and contrast to increase the color difference between the fruit and the background, making subsequent de-pricking and denoising processing easier.In Fig. 3c, a 5 × 5 median filter is used for depricking and denoising.Shadows can be removed by adjusting the gray value threshold (Fig. 3d gray image with appropriate threshold).In this paper, the Canny operator is used for edge detection to obtain clear edge lines and to facilitate the filling of voids, and boundary tracking is used to connect the boundaries (Fig. 3e). Figure 3f shows the filled binary image.Figure 3g shows the complete Rosa roxburghii binary image.The Rosa roxburghii contours and pixel-valued surfaces are kept as full as possible.

Rosa roxburghii image segmentation
Image segmentation is used to extract the shape of the Rosa roxburghii fruits from the background to facilitate the acquisition of a complete binary image 31 .The first one is HSV color space segmentation since RGB color space   describes the combination of red, green, and blue in an image and is more sensitive to changes in light.Therefore, HSV color space can be used.Separate the fruit from the background by setting thresholds for hue and saturation.The second is K-means clustering segmentation, where the fruit's color is significantly different from the background, and more than two different clusters are set to segment the fruit.As shown in Fig. 4b,c, these two methods are used to segment the preprocessed Rosa roxburghii fruit image (Fig. 4a).However, HSV color space segmentation in different lighting environments, setting the saturation threshold is different, the definition of the edge of Rosa roxburghii fruit is not very accurate, and finding the optimal threshold is time-consuming.Therefore, we use K-means clustering segmentation and set the same threshold for large-scale segmentation processing.

Determination and analysis of physical characteristics
Some scholars place apparent size positions (such as maximum and minimum diameters) in the horizontal or vertical direction when making image measurements.This is unreasonable, and it cannot be guaranteed that the positions of the apparent size are entirely appropriate.Therefore, we use the approximate axisymmetric method.
Each Rosa roxburghii fruit is approximated as an ellipsoid.Then the three projection surfaces are compared as an ellipsoid, and the most significant half-long axis and half-short axis in the three ellipsoids are the half-long axis (a) and half-short axis (b) of Rosa roxburghii fruit.The principle is shown in Fig. 5.The ellipse is oriented along the horizontal line, and only the number of pixel values in the center position is maximum, so the horizontal line segment is maximum.By traversing each row of 01 numbers in the direction of the horizontal line of the target binary image, the horizontal line segment whose ellipse has the largest number of pixel values at that position is found.When photographed, Rosa roxburghii fruits are not guaranteed to have a long or short axis on the horizontal.So, we need to rotate the image several times, with each rotation traversing to measure its maximum value.
As shown in Fig. 5a-d, the lengths of the four line segments are different.The maximum of the largest value of all the rotated images is measured (L1), which is the length of the pixel value for the long axis, and the minimum of the largest value (L3), which is the length of the pixel value for the short axis.By calibrating the camera, we can easily know the pixel values per unit area and unit length, which can be converted to get the actual length.
Projected area measurements can be calculated from dimensional information, but the formulas assume that the contours are regular geometric shapes 10 .However, most fruits, such as sweet potatoes and mangoes, have irregular contours, and their projected areas can be obtained from images 7,32 .First, calculate the number of 01 in the binary image of Rosa roxburghii fruits after segmentation, and then through the pre-measured pixel value ratio per unit area, we can find the projected area of the three regions of Rosa roxburghii, which are perpendicular to each other, where the projected area of Rosa roxburghii indicates PA 1 , PA 2 , and PA 3 .Among them, PA 1 and PA 2 are side view surfaces along the X-axis and Y-axis, respectively, and PA 3 is a top view surface along the Z-axis.The calculated criterion projected area (CPA) formula is then as follows: Table 2 shows the maximum, minimum, and average values of the dimensions and projected areas determined in the above ways.A digital balance measures the mass, and the volume is measured by the water displacement method.The table also shows that the difference between the areas measured from PA 1 and PA 2 is not significant.There is severe interoperability because these two surfaces are perpendicular to each other in the side view and have similar characteristics.The relationship between each physical characteristic's measured properties (correlation coefficients) is shown in Fig. 6.The correlation coefficients between the measured length of the half-short   www.nature.com/scientificreports/axis (b) and the input data were low (0.82-0.91), while the correlation coefficients for the other parameters ranged from 0.92 to 0.98.This information will help in the development and design of grading equipment 33 .

Evaluation of image measurement methods
The dimensions and projected areas were obtained under the same shooting conditions.Therefore, it is only necessary to verify the consistency of the area obtained by the pixel value method with the actual area.We first used the pixel value method to photograph the maximum top-view surface of 16 Rosa roxburghii fruits, extracted the pixel values, and converted them to area.Then, we measured the actual area using the slice-and-integrate method 34 .The principle is shown in Fig. 7a.Along the vertical direction of the Z-axis, horizontal to the XYplane, the Rosa roxburghii fruit is continuously sliced.Each slice is calculated in the coordinate paper, as shown   www.nature.com/scientificreports/ in Fig. 7b.The actual area of the Rosa roxburghii fruit is the integral area of the largest slice.The integration formula is as follows: where �θ i is the step angle, r j i is the radius of the step angle, and n is the total number of steps.Figure 7c shows the absolute error (AE) and relative error (RE) between the 16 areas obtained by the pixel value method and the actual area.The mean relative error (MRE) was used as the main criterion for evaluating the accuracy of the image measurements, as in Eq. ( 3), which has an MRE of 1.22%.The results show that the pixel value method can better reflect the actual projected area and dimensions of Rosa roxburghii fruits.
where pv is the image measurement, mv is the slice-and-integrate method measurement, n is the total number of measurements.
where Y is the dependent variable which can be mass M (g) or volume V (ml), "X" is the physical parameters related to the estimated object including a, b, PA 1 ,PA 2 , PA 3 , and CPA.k 1 , k 2 and k 3 are fitting constants.
The a and b, obtained from the images, are used as independent variables to complete the mass and volume modeling using the mathematical model in Eqs. ( 4)- (7).In addition, the multiple regression model is fitted based on the two independent variables, with equations as in Eqs. ( 8) and ( 9) 36 .
where k 1 , k 2 and k 3 are regression constants.

Projected area-based estimative modeling
The three projected areas obtained from the images were used as independent variables to complete the mass and volume modeling using the mathematical models in Eqs. ( 4)- (7) .In addition, the multiple regression model is fitted based on the three independent variables with equations as in Eqs.(10) and (11).
where k 1 , k 2 , k 3 and k 4 are regression constants.

Geometric volume-based estimative modeling
In the third classification, we first need to assume that the Rosa roxburghii fruits are ellipsoidal and parabolic in shape based on the measured dimensions of the fruits (half-long axis a and half-short axis b), with equations as in Eqs.(12) and (13).The volume is obtained as the independent variable by these two equations.Then, the mass and volume modeling is completed according to the mathematical model in Eqs. ( 4)-( 7). (2)

Plant guideline statement
We confirm that all the experimental research and field studies on plants (either cultivated or wild), including the collection of plant material, complied with relevant institutional, national, and international guidelines and legislation.The Rosa roxburghii in this study is not a United Nations endangered species.All of the material is owned by the authors and/or no permissions are required.The plant specimens for this study are deposited in the Chinese Virtual Herbarium at https:// www.cvh.ac.cn/ spms/ detail.php? id= f7e5c 506.The collection barcode is GZTM0066333 and the identifier is Weike Jiang.

Dimension-based mass and volume models
Analytical software such as Matlab fits the model and analyzes the sample data.The coefficient of determination (R 2 ), chi-square (χ 2 ), and root mean square error (RMSE) were chosen as criteria for evaluating the regression model's applicability.The model with larger R 2 , smaller χ 2 , and RMSE was selected as appropriate.
Table 3 shows the estimation models for mass and volume based on dimensions.As in Eqs. ( 14) and ( 15), the multiple regression model based on two dimensions (a and b) had the largest R 2 , the smallest χ 2 and RMSE, and was estimated to be optimal for mass (R 2 = 0.948, χ 2 = 5.38, RMSE = 1.22) and volume (R 2 = 0.896, χ 2 = 11.63,RMSE = 1.78).As shown in Fig. 8, the quadratic model based on an estimate of mass and volume is better in terms of the single-factor estimation of mass and volume.

Projected area-based mass and volume models
Table 4 shows the estimation models based on the projected area.As in Eq. ( 16), the quadratic model based on CPA estimates the optimal mass, with R 2 (0.981) being the largest and χ 2 (1.82) and RMSE (0.73) being the smallest.As in Eq. ( 17), the quadratic model based on three dimensions ( PA 1 , PA 2 , and PA 3 ) estimates the optimal volume, with R 2 (0.898) being the largest and χ 2 (11.83) and RMSE (1.76) being the smallest.The model is based on CPA and three dimensions, and all three facets must be measured simultaneously.Therefore, when considering the single factor estimation model as in Fig. 9, the quadratic model based on PA 3 is better in estimating mass.The quadratic model based on PA 2 estimates the volume better.(16) M = 0.280 + 0.940CPA + 0.071CPA 2 , Table 3. Dimension-based estimation models for mass and volume.a is the half-length axis and b is the halfshort axis.R 2 is coefficient of determination.χ 2 is chi-square.RMSE root mean square error.

Geometric volume-based mass and volume models
Table 5 shows the estimated models based on geometric volumes.Among the mass models, the quadratic model based on the ellipsoid method (Eq.18) is the most appropriate.Figure 10 shows the fitted plot, with R 2 (0.967) being the largest, and χ 2 (3.34) and RMSE (0.98) being the smallest.In the volume model, the correlation between the volume measured by the ellipsoid method and the actual volume of the fruit was better (R 2 = 20.902).Still, its RMSE (7.95) was too large, and it did not allow for an accurate estimation of the fruit volume.The RMSE of the volume measured by the parabolic method (2.11) was smaller, so the volume calculated by the parabolic method was closer to the actual volume of the Rosa roxburghii fruit from the analysis of the computed error.

Evaluation of three estimation models
In Tables 3, 4, and 5, we developed estimation models for Rosa roxburghii mass and volume based on dimensions, projected area, and geometric volume.Among the mass estimation models established based on these three classifications, the optimal one is the quadratic model based on the criteria projected area (CPA), as in Eq. ( 16), R 2 = 0.981, and RMSE = 0.73.Among the volume estimation models based on these three classifications, the optimal is the multiple regression model based on three dimensions ( PA 1 , PA 2 , and PA 3 ), as in Eq. ( 17), R 2 = 0.898 and RMSE = 1.76.Based on other similar studies on fruits, it was found that the multiple regression model and the mathematical model based on the criteria projected area had a high correlation and better reflected the relationship between the physical characteristics measured and the mass and volume of the fruits.Related species such as apple ber (Ziziphus mauritiana L.) had multiple regression models predicting mass and volume with R 2 of 0.935 and 0.950, respectively 33 .The multiple regression model for elephant apple (Dillenia indica L.) had an R 2 of 0.935 for predicting mass 18 .
The optimal model for estimating mass and volume above requires multiple factors to be involved in the calculation simultaneously, which requires three camera positions to capture.However, in practical applications, only one camera position can save costs.In addition, when sorting large quantities, using physical features extracted from the same image to estimate mass and volume simultaneously enables fast computer processing 25 .Therefore, in cases where the impact on estimation accuracy is not significant, measuring both mass and volume is recommended based solely on physical information obtained from a single image.When we photographed the three faces of Rosa roxburghii, the two side view surfaces based on the X and Y axes measured highly similar areas.Therefore, if the side view camera position based on Rosa roxburghii, it is recommended that a multiple regression model based on the two dimensions (a and b) be estimated for mass (R 2 = 0.948, RMSE = 1.22) and volume (R 2 = 0.896, RMSE = 1.78).If the top view camera position is based on the Z-axis of Rosa roxburghii, it is recommended to estimate the mass (R 2 = 0.965, RMSE = 1.01) and volume (R 2 = 0.860, RMSE = 2.07) based on the quadratic model of PA 3 .
We compare the optimal estimates in the model with the recommended estimates and also compare the accuracy (φ) of each estimation model, as in Eq. (19). Figure 11a shows the comparison between the estimated mass and the mass measured by the scale.The two recommended mass estimation models are not very different from the optimal estimation model.The optimal estimation model based on CPA has an accuracy of 99.27%, the

Conclusions
In this paper, we propose an approach to estimate the mass and volume of Rosa roxburghii by obtaining the dimensions and projected area of the fruit based on a two-dimensional image.When measuring the dimensional information, we propose to approximate Rosa roxburghii as an ellipsoid to obtain any placement position to measure its half-long and half-short axes.In this paper, the reliability of the image measurement method was also verified at the same time, and the mean relative error (MRE) was 1.22% compared with the results of the actual measurements carried out by the slicing method, which proves that the method can better reflect the dimensional information and projected area of Rosa roxburghii.In the mass and volume estimation models, the dependent variable parameters of the best estimation model must be obtained simultaneously from three camera positions.The best estimation model for mass is the quadratic model based on the criterion projected area (the average of the three projected areas), R 2 = 0.981, and the equation is M = 0.280 + 0.940CPA + 0.071CPA 2 .The best estimation model for volume is the multiple regression model based on the three projected areas, R 2 = 0.898, and the equation is V = −8.467+ 0.657PA 1 + 1.294PA 2 + 0.628PA 3 .From an economic point of view, using only one camera position can save costs, and it is recommended to estimate both mass (R 2 = 0.948) and volume (R 2 = 0.896) based on the dimensional information of the side view plane (multiple regression model for a and b).Mass prediction model

Figure 1 .
Figure 1.General process for image-based mass and volume estimation of Rosa roxburghii fruits.

Figure 2 .
Figure 2. Image acquisition device and equipment.

Figure 3 .
Figure 3. (a) Original image (b) adjust saturation and contrast of the image (c) de-pricking and denoising (d) gray-level thresholding for further de-shading (e) edge detection and generating voids (f) fill a void (g) final binary image.

Figure 5 .
Figure 5. Measuring dimensions by the approximate ellipse method.

Figure 6 .
Figure 6.Relationship between the determination of physical characteristics of Rosa roxburghii fruits.a is the half-length axis and b is the half-short axis.PA projected area.CPA criterion projected area.

Figure 7 .
Figure 7. (a) Schematic of slicing (b) slice-and-integrate method of measuring area (c) error evaluation.

Figure 8 .
Figure 8.(a) Estimated mass quadratic model based on half-length axis (a) (b) estimated volume quadratic model based on half-length (a).R 2 is coefficient of determination.RMSE root mean square error.

Figure 9 .
Figure 9. (a) Estimated quadratic model of mass based on PA 3 (b) estimated quadratic model of volume based on PA 2 .PA projected area.R 2 is coefficient of determination.RMSE root mean square error.

Figure 10 .
Figure 10.Estimated quadratic model of mass based on ellipsoidal volume ( V ellip ).R 2 is coefficient of determination.RMSE root mean square error.

Figure 11 .
Figure 11.Comparison of recommended and optimal estimates of mass and volume with measured values: (a) quadratic model based on CPA, multiple regression model based on a and b, quadratic model based on PA 3 (b) multiple regression model based on PA 1 , PA 2 , and PA 3 , multiple regression model based on a and b, quadratic model based on PA 3 .a is the half-length axis and b is the half-short axis.PA projected area.CPA criterion projected area.

Table 1 .
Parameter values for three categories of Rosa roxburghii fruits.

Table 2 .
Data on physical characteristics of Rosa roxburghii fruits.a is the half-length axis and b is the halfshort axis.PA projected area, CPA criterion projected area.

Table 5 .
Geometric volume-based estimation models for mass and volume.V ellip is ellipsoidal volume, V ellip is parabolic volume, R2 is coefficient of determination, χ 2 is chi-square, RMSE root mean square error.